105 research outputs found
Experiments on applying relaxation labeling to map multilingual hierarchies
This paper explores the automatic construction of a multilingual
Lexical Knowledge Base from preexisting lexical resources. This paper
presents a new approach for linking already existing hierarchies. The
Relaxation labeling algorithm is used to select --among all the
candidate connections proposed by a bilingual dictionary-- the right
conection for each node in the taxonomy.Postprint (published version
Multilingual knowledge resources for wide–coverage semantic processing
Este artÃculo presenta el resultado del estudio de un amplio conjunto
de bases de conocimiento multilÃngües actualmente disponibles que pueden ser de
interés para un gran número de tareas de procesamiento semántico a gran escala. El
estudio incluye una amplia gama de recursos derivados de forma manual y automática
para el inglés y castellano. Con ello pretendemos mostrar una imagen clara de su
estado actual. Para establecer una comparación justa y neutral, la calidad de cada
recurso se ha evaluado indirectamente usando el mismo método en dos tareas de resolución de la ambigüedad semántica de las palabras (WSD, del inglés Word Sense
Disambiguation). En concreto, las tareas de muestra léxica del inglés del Senseval-3.This report presents a wide survey of publicly available multilingual
Knowledge Resources that could be of interest for wide–coverage semantic processing
tasks. We also include an empirical evaluation in a multilingual scenario of the
relative quality of some of these large-scale knowledge resources. The study includes
a wide range of manually and automatically derived large-scale knowledge resources
for English and Spanish. In order to establish a fair and neutral comparison, the
quality of each knowledge resource is indirectly evaluated using the same method
on a Word Sense Disambiguation task (Senseval-3 English Lexical Sample Task).Este trabajo ha sido parcialmente financiado por
grupo IXA de la UPV/EHU y los proyectos KNOW
(TIN2006-15049-C03-01) y ADIMEN (EHU06/113)
Multilingual evaluation of KnowNet
Este artÃculo presenta un nuevo método totalmente automático de construcción de bases de conocimiento muy densas y precisas a partir de recursos semánticos preexistentes. Básicamente, el método usa un algoritmo de Interpretación Semántica de las palabras preciso y de amplia cobertura para asignar el sentido mas apropiado a grandes conjuntos de palabras de un mismo tópico que han sido obtenidas de la web. KnowNet, la base de conocimiento resultante que conecta grandes conjuntos de conceptos semánticamente relacionados es un paso importante hacia la adquisición automática de conocimiento a partir de corpus. De hecho, KnowNet es varias veces mas grande que cualquier otro recurso de conocimiento disponible que codifique relaciones entre sentidos, y el conocimiento que KnowNet contiene supera cualquier otro recurso cuando es empÃricamente evaluado en un marco multilingüe común.
This paper presents a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method uses a wide-coverage and accurate knowledge-based Word Sense Disambiguation
Algorithm to assign the most appropriate senses to large sets of topically related words acquired from the web. KnowNet, the resulting knowledge-base which connects large sets of semantically-related concepts is a major step towards the autonomous acquisition of knowledge from raw corpora. In fact, KnowNet is several times larger than any available knowledge resource encoding relations between synsets, and the knowledge KnowNet contains outperform any other resource when is empirically evaluated in a common multilingual framework.Peer ReviewedPostprint (published version
SemEval-2007 Task 16: evaluation of wide coverage knowledge resources
This task tries to establish the relative quality of available semantic resources (derived by manual or automatic means). The quality of each large-scale knowledge resource is indirectly evaluated on a Word Sense Disambiguation task. In particular, we use Senseval-3 and SemEval-2007 English Lexical Sample tasks as evaluation bechmarks
to evaluate the relative quality of each resource. Furthermore, trying to be as neutral as possible with respect the knowledge bases studied, we apply systematically the same disambiguation method to all the resources. A completely different behaviour is observed on both lexical data sets (Senseval-3 and SemEval-2007).Peer ReviewedPostprint (author’s final draft
KnowNet: A proposal for building highly connected and dense knowledge bases from the web
This paper presents a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method uses a wide-coverage and accurate nowledge-based Word Sense Disambiguation algorithm to assign the most appropriate senses to large sets of topically related words acquired from the web.
KnowNet, the resulting knowledge-base which connects large sets of semantically-related concepts is a major step towards the autonomous acquisition of knowledge from raw corpora. In fact, KnowNet is several times larger than any available knowledge resource encoding relations between synsets, and the knowledge KnowNet contains outperform any other resource when is empirically evaluated in a common multilingual framework.Peer ReviewedPreprint (author's version
Highlighting relevant concepts from Topic Signatures
This paper presents deepKnowNet, a new fully automatic method for building highly dense and accurate knowledge bases from existing
semantic resources. Basically, the method applies a knowledge-based Word Sense Disambiguation algorithm to assign the most appropriate WordNet sense to large sets of topically related words acquired from the web, named TSWEB. This Word Sense Disambiguation algorithm is the personalized PageRank algorithm implemented in UKB. This new method improves by automatic means the current content of WordNet by creating large volumes of new and accurate semantic relations between synsets. KnowNet was our first attempt towards the acquisition of large volumes of semantic relations. However, KnowNet had some limitations that have been overcomed with deepKnowNet. deepKnowNet disambiguates the first hundred words of all Topic Signatures from the web (TSWEB). In this case, the method highlights the most relevant word senses of each Topic Signature and filter out the ones that are not so related to the topic. In fact,
the knowledge it contains outperforms any other resource when is empirically evaluated in a common framework based on a similarity
task annotated with human judgementsPostprint (published version
A Proposal for word sense disambiguation using conceptual distance
This paper presents a method for the resolution of lexical ambiguity and its
automatic evaluation over the Brown Corpus. The method relies on the use of
the wide-coverage noun taxonomy of WordNet and the notion of conceptual
distance among concepts, captured by a Conceptual Density formula developed
for this purpose. This fully automatic method requires no hand coding of
lexical entries, hand tagging of text nor any kind of training process. The
results of the experiment have been automatically evaluated against SemCor,
the sense-tagged version of the Brown Corpus.Postprint (published version
The MEANING Project
A pesar del progreso que se realiza en el Procesamiento del Lenguaje Natural (PLN) aún estamos lejos de la Comprensión del Lenguaje Natural. Un paso importante hacia este objetivo es el desarrollo de técnicas y recursos que traten conceptos en lugar de palabras. Sin embargo, si queremos construir la próxima generación de sistemas inteligentes que traten TecnologÃa de Lenguaje Humano en dominios abiertos necesitamos resolver dos tareas intermedias y complementarias: resolución de la ambigüedad léxica de las palabras y enriquecimiento automático y a gran escala de bases de conocimiento léxico.Progress is being made in Natural Language Processing (NLP) but there is still a long way towards Natural Language Understanding. An important step towards this goal is the development of technologies and resources that deal with concepts rather than words. However, to be able to build the next generation of intelligent open domain Human Language Technology (HLT) application systems we need to solve two complementary and intermediate tasks: Word Sense Disambiguation (WSD) and automatic large-scale enrichment of Lexical Knowledge Bases.The MEANING Project is funded by the EU 5th Framework IST Programme
Asignación automática de etiquetas de dominios en WordNet
En este artÃculo se describe un procedimiento para asignar de forma automática etiquetas de dominio a las glosas de WordNet. Una de las motivaciones principales del trabajo es enriquecer fuentes léxicas con información de WordNet. Para ello, se utilizan los WordNet DOMAINS. Finalmente, se proponen y corrigen etiquetas de dominios para la parte nominal y verbal de WordNet.This paper describes a process to automatically assign wordnet domain labels to WordNet glosses. One of the main goals of this work is to enrich lexical sources with WordNet information. WordNet domains are used as knowledge source. Finally, Domain labels for nouns and verbs are suggested and verified.Este artÃculo ha sido financiado parcialmente por la Comisión Europea (MEANING IST-2001-34460), Generalitat de Catalunya (2002FI 00648) y Universidad Tecnológica Metropolitana - Chile
Exploring the automatic selection of basic level concepts
We present a very simple method for selecting
Base Level Concepts using basic structural properties of WordNet. We also empirically demonstrate that these automatically derived set of
Base Level Concepts group senses into an adequate level of abstraction in order to perform
class-based Word Sense Disambiguation. In fact
a very naive Most Frequent classifier using the
classes selected is able to perform a semantic tagging with accuracy figures over 75%.Union Europea bajo proyecto QALL-ME (FP6 IST-033860) y el Gobierno Español bajo el proyecto Text-Mess (TIN2006-15265-C06-01) y KNOW (TIN2006-15049-C03-01
- …